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Exploration of the intractable posterior distributions associated with Bayesian versions of the 
general linear mixed model is often performed using Markov chain Monte Carlo. In particular, if 
a conditionally conjugate prior is used, then there is a simple two-block Gibbs sampler available. 
Roman and Robert [Linear Algebra Appl. 473 (2015) 54-77] showed that, when the priors are 
proper and the X matrix has full column rank, the Markov chains underlying these Gibbs sam¬ 
plers are nearly always geometrically ergodic. In this paper, Roman and Robert’s (2015) result 
is extended by allowing improper priors on the variance components, and, more importantly, by 
removing all assumptions on the X matrix. So, not only is X allowed to be (column) rank defi¬ 
cient, which provides additional flexibility in parameterizing the fixed effects, it is also allowed 
to have more columns than rows, which is necessary in the increasingly important situation 
where p > N. The full rank assumption on X is at the heart of Roman and Robert’s (2015) 
proof. Consequently, the extension to unrestricted X requires a substantially different analysis. 
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1. Introduction 

The general linear mixed model (GLMM) is one of the most frequently applied statistical 
models. A GLMM with r random factors takes the form 

r 

Y = Xp -\- ^ ^ ZiUi -\- e, 

i=l 

where Y is an observable iV x 1 data vector, X and {ZiW^i are known matrices, P 
is an unknown p x 1 vector of regression coefficients, are independent random 

vectors whose elements represent the various levels of the random factors in the model, 
and e ~ Njv(0, A“^/). Assume that e and u := (uf ■■■ u^)'^ are independent, 

and that u ^ Ng(0, A“^), where u; is q; x 1, q = qi qr, and A = 0^]^ ^Uilqi ■ 
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Letting Z = {Zi Z2 ■ ■ ■ Zr), we can write Let A denote the vector 

of precision parameters, i.e., A = (Ae A„j • • • A„^)^. To rule out degenerate cases, we 
assume throughout that N >2, and that qi>2 for each * = 1 , 2,..., r. For a book-length 
treatment of the GLMM, which is sometimes called the variance components models 
see [15]. 

In the Bayesian setting, prior distributions are assigned to the unknown parameters, /3 
and A. Unfortunately, the Bayes estimators associated with any non-trivial prior cannot 
be obtained in closed form. This is because such estimators take the form of ratios of high¬ 
dimensional, intractable integrals. The dimensionality also precludes the use of classical 
Monte Carlo methods that require the ability to draw samples directly from the posterior 
distribution. Instead, the parameter estimates are typically obtained using Markov chain 
Monte Carlo (MCMC) methods. In particular, when (proper or improper) conditionally 
conjugate priors are adopted for /3 and A, there is a simple block Cibbs sampler that 
can be used to explore the intractable posterior density. Let 9 = (/3^ , and denote 

the posterior density as 7r(0,A|y), where y denotes the observed data vector. (Since u 
is unobservable, it is treated like a parameter.) When the conditionally conjugate priors 
are adopted, it is straightforward to simulate from 9\X,y, and from X\9,y. Indeed, 9\X,y 
is multivariate normal and, given {9,y), the components of A are independent gamma 
variates. Hence, it is straightforward to simulate a Markov chain, {(0„, A„)})]Uq, that has 
7 r(0,A|?/) as its invariant density. Our main results concern the convergence properties 
of this block Cibbs sampler. We now provide some background about Markov chains on 
which will allow us to describe our results and their practical importance. 

Let V = {Vm}m=o denote a Markov chain with state space V C and assume the 
chain is Harris ergodic; that is, ^/^-irreducible, aperiodic and positive Harris recurrent 
(see [ 8 ] for definitions). Assume further that the chain has a Markov transition density 
(with respect to Lebesgue measure), fc : V x V —>• [0, 00 ). Then, for any measurable set A, 
we have 


PT{Vm+l€A\Vm=v)= [ k{v'\v)dv. 

J A 


For m G {2,3,4,...}, the m-step Markov transition density (Mtd) is defined inductively 
as follows 

¥^{v'\v)= f k"^~^{v'\u)k{u\v)du. 

Jy 

Of course, kX = k, and fc'"(-|u) is the density of Vm conditional on Vq = v. Suppose that the 
invariant probability distribution also has a density (with respect to Lebesgue measure), 
/t: V —>■ [0,oo). The chain V is geometrically ergodic if there exist M : V —>■ [0,oo) and 
7 G [0,1) such that, for all m G N, 


\k'^{v'\v) — K{v')\dv' < M(u) 7 ” 


( 1 ) 


Of course, the quantity on the left-hand side of (1) is the total variation distance between 
the invariant distribution and the distribution of Vm given Vq = v. 
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There are many important practical and theoretical benefits of using a geometrically 
ergodic Markov chain as the basis of one’s MCMC algorithm (see, e.g., [3, 6, 9]). Perhaps 
the most important of these is the ability to construct valid asymptotic standard errors for 
MCMC-based estimators. Let /i: V —>■ M be a function such that |h(ti)|K(z;) dv < oo, and 
suppose that the chain V is to serve as the basis of an MCMC algorithm for estimating 
oj := Jy h(v)K{v) dv < oo. Harris ergodicity guarantees that the standard estimator of w, 
hm '■= is strongly consistent. However, Harris ergodicity is not enough 

to ensure that hm satisfies a central limit theorem (CLT). On the other hand, if V 
is geometrically ergodic and there exists an e > 0 such that \h{v)\^^‘^ k{v) dv < oo, 
then hm does indeed satisfy a i/ro-CLT; that is, under these conditions, there exists 
a positive, finite cr^ such that, as to —>■ oo, y/m{hm — ui) N(0,(T^). This is extremely 
important from a practical standpoint because all of the standard methods of calculating 
valid asymptotic standard errors for hm are based on the existence of this CLT (Flegal, 
Haran and Jones [3]). 

There have been several studies of the convergence properties of the block Gibbs sam¬ 
pler for the GLMM (Johnson and Jones [5], Roman and Hobert [12, 13]). These have 
resulted in easily-checked sufficient conditions for geometric ergodicity of the underlying 
Markov chain. However, in all of the studies to date, the matrix X has been assumed 
to have full column rank. In this paper, we extend the results to the case where X is 
completely unrestricted. So, not only do we allow for a rank deficient X, which pro¬ 
vides additional flexibility in parameterizing the fixed effects, we also allow for X with 
p> N, which is necessary in the increasingly important situation where there are more 
predictors than data points. Two different families of conditionally conjugate priors are 
considered, one proper and one improper. We now describe our results, beginning with 
the results for proper priors. 

Assume that /3 and the components of A are all a priori independent, and that /3 ^ 
Np(^^,E^), Ae ~ Gamma(ao,f»o) and, for i = l,...,r, A„^ ^ Gamma(ai,6^). Our result 
for proper priors, which is a corollary of Proposition 1 from Section 3, is as follows. 

Corollary 1. Under a proper prior, the block Gibbs Markov chain, {(A„, is 

geometrically ergodic if: 

1. ao> ^{Tank{Z) — N+ 2), and 

2. min{ai + ^,..., Or + ^} > — rank(Z)) -|- 1. 

The conditions of Corollary 1 are quite weak in the sense that they would nearly always 
be satisfied in practice. Indeed, it would typically be the case that rank(Z) — N < —2 
(making the first condition vacuous) and q — rank(Z) is close to zero (making the second 
condition easily satisfied). In fact, if q = rank(Z), which is the case for many standard 
designs, then the second condition is also vacuous. 

Roman and Hobert [13] (hereafter R&H15) proved this same result under the restrictive 
assumption that X has full column rank. Moreover, the rank assumption is at the very 
heart of their proof. Indeed, these authors established a geometric drift condition for 
the marginal chain, {9n}^=oj but their drift (Lyapunov) function is only valid when X 
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has full column rank. Our proof is significantly different. We analyze the other marginal 
chain, {A„}^g, using a drift function that does not involve the matrix X. Generally 
speaking, minor changes in a drift function often lead to significant differences in what 
one is able to prove. Thus, it is somewhat surprising that we are able to recover exactly 
the conditions of R&H15. To be fair, we are able to use several of their matrix bounds, 
but only after extending them to the case where X is unrestricted. 

When X does not have full column rank, a flat prior on /3 leads to an improper 
posterior. Thus, the improper priors that we consider are actually partially proper. 
In particular, assume again that j3 and the components of A are all a priori in¬ 
dependent, and that /I ^ Np(/i^,S^). But now take the prior on Ag to be (propor¬ 
tional to) A“““^e“^“^'/(o,oo)(Ae), and for i= l,...,r, take the prior on A„^ to be 
A“)“^e“^'^“i/(o,oo)(AuJ. Assume that min{ai, &i} < 0 for at least one i = 0,1,... ,r; oth¬ 
erwise, we are back to the proper priors described above. (See [4] for a comprehensive 
discussion about improper priors for variance components.) Let W = {X Z), so that 
WO = XP + Zu, and define SSE = \\y — where 6 = (W'^W)'^W'^y and superscript 

“-I-” on a matrix denotes Moore-Penrose inverse. Our result for improper priors, which 
is another corollary of Proposition 1, is as follows. 

Corollary 2. Under an improper prior, the block Gibbs Markov chain, {(A„,0„)}))Tg, 
is geometrically ergodic if: 

1. 26o-bSSE>0, 

2. For each i e {1,2,..., r}, either > 0 or Oi <bi= 0, 

3. ao > ■^(rank(Z) — X + 2), and 

4. min{ai + . ,ar + ^} > \{q — rank(Z)) -|- 1. 

Note that the two conditions of Corollary 1 are exactly the same as the third and 
fourth conditions of Corollary 2. Furthermore, the first two conditions of Corollary 2 are 
necessary for posterior propriety [16], and hence for geometric ergodicity. Consequently, 
the commentary above regarding the weakness of the conditions of Corollary 1 applies 
here as well. 

Corollary 2 is the first convergence rate result for the block Gibbs sampler for this set 
of partially proper priors. Roman and Hobert [12] (hereafter R&H12) proved a similar 
result (see their Corollary 1) for a different family of improper priors in which our proper 
multivariate normal prior on /3 is replaced by a flat prior. Of course, because they used a 
flat prior on /3, their results are only relevant in the case where X has full column rank. 

The remainder of this paper is organized as follows. A formal definition of the block 
Gibbs Markov chain is given in Section 2. Section 3 contains our convergence rate anal¬ 
ysis of the block Gibbs sampler under proper and improper priors. A short discussion 
concerning an alternative result for proper priors appears in Section 4. Some technical 
details are relegated to an Appendix. 
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The block Gibbs sampler is driven by the Markov chain {(0n, A„)}5 ^q, which lives on the 
space x where R+ := (0,oo). The Markov transition density (of the version 

that updates 0 first) is given by 

k{9,X\0,X) = 7TCMO,y)7Tm,y)- 


We will often suppress dependence on y, as we have in the Markov transition density. The 
conditional densities, 7r(A|0,?/) and 7r(0|A,t/), are now described. The following formulas 
hold for both sets of priors (proper and improper). The components of A are conditionally 
independent given 0, and we have 


Xe\0 Gamma 



Wv-wof -^ 


( 2 ) 


and, for i = 1,..., r, 


Xui \ 0 ~ Gamma 




(3) 


When considering improper priors, we assume that these conditional distributions are 
all well defined. In other words, we assume that and {6i}[^o such that all 

of the shape and rate parameters in the gamma distributions above are strictly posi¬ 
tive. Of course, this is not enough to guarantee posterior propriety. However, the drift 
technique that we employ is equally applicable to positive recurrent (proper posterior) 
and non-positive recurrent (improper posterior) Markov chains [11]. Furthermore, geo¬ 
metrically ergodic chains are necessarily positive recurrent, so any Gibbs Markov chain 
that we conclude is geometrically ergodic, necessarily corresponds to a proper posterior. 
Consequently, there is no need to check for posterior propriety before proceeding with 
the convergence analysis. 

Now define Ta = XeX'^X -f Mx = I - XeXTff^X'^, and Qx = XeZ^MxZ + A. 
Conditional on A, 0 is multivariate normal with mean 


E[6i|A] 


T-\XeX^y + V/ 3 ) - XlT-^X^ZQl^Z^{Mxy - V/ 3 ) 

XeQf^Z^iMxy - XT-^E/pp) 


and covariance matrix 


Var[6»|A] 



-f XlTff^X'^ZQf^Z'^XTff^ 
-XeQ-x^Z^XTff^ 


-XeTff^X^ZQf^ 

A-1 


Q> 


(4) 

(5) 


(A derivation of these conditionals can be found in [1].) 

The two marginal sequences, {0n}^=o and {A„}()Tg, are themselves Markov chains, 
and it is easy to establish that (when the posterior is proper) all three chains are Harris 
ergodic. Moreover, geometric ergodicity is a solidarity property for these three chains. 
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that is, either all three chains are geometrically ergodic, or none of them is (see, e.g., 
[2, 10, 14]). Again, in contrast with R&H15, who analyzed the 0-chain, {0n}^^Q, we 
establish our results by analyzing the A-chain, {A„}(]Aq. The Mtd of the A-chain is given 

by 



R&H12 also analyzed the A-chain, and their analysis serves as a road map for ours. In 
fact, we use the same drift function as R&H12. 

3. Convergence analysis of the block Gibbs sampler 

In order to state our main result, we require a couple of definitions. For i = 1,2,..., r, 
let be the qt x q matrix defined as Ri = [Oq^^qi ■ ■■Oqixqi-i Iqixqt 0 g,x 9 i+i ■ ■ -Og^xg^]- 
Note that Ui = RiU. Let Pz^z denote the orthogonal projection onto the column space 
of Z’^ Z. Finally, define 



The following result holds for both sets of priors (proper and improper). 


Proposition 1. The block Gibbs sampler Markov chain, {dm is geometrically 

ergodic if: 

1. s>0; 

2. 2bo + SSE>0; 

3. For each i€ {1,2,..., r}, either bi> 0 or Ui <bi = 0; and 

4. There exists s € (0,1] fl (0, s) such that 



( 6 ) 



Remark 1. When the prior is proper, that is, when > 0 and hi > 0 for all i € 
{0,1,...,?’}, the first three conditions are automatically satisfied, and s > I. On the 
other hand, when the prior is improper, these three conditions ensure that TT{X\0,y) is 
well defined. 

Before embarking on our proof of Proposition 1, we quickly demonstrate that Corol¬ 
laries 1 and 2 follow immediately from it. 
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Proof of Corollary 1. Since the prior is proper, it is enough to show that the conditions 
of Corollary 1 imply that (6) is satisfied for some s £ (0,1]. We show that this is indeed 
the case, with s = 1. First, 

r(ao +-/V/2 — 1) rank(Z) rank(Z^) 
r(ao + iV/2) 2 ^ 2ao + TV - 2 ’ 

which yields the first half of (6). Now note that 


^tr(i?,(/-P 2 T 2 )i?f) =tr 


2 = 1 
Thus, 




i=l 


= tr(/ — P^Tz) =Q~ rank(Z’). 


r(ai + qi/2 — 1) tT{Ri{I — Pz'rz)Pl) __ tr(Pi(J — Pz'rz)Pf) 

^ T{ai+qi/2) 2 ^ 2ai + qt — 2 

^ q — rank(Z) 

“ mini=i,,,,,r(2aj + - 2) ’ 


which yields the second half of (6). □ 

Proof of Corollary 2. First note that conditions 3 and 4 of Corollary 2 imply that 
S > 1. The rest of the proof is the same as the proof of Corollary 1. □ 

Our proof of Proposition 1 is based on four lemmas, which are proven in the Appendix. 
Let dmax denote the largest singular value of the matrix X = . 

Lemma 1. For each i G {1,2,..., r}, we have 

r 

tr(i?,Q-ipf) < (dLx + Ar') ir{R,{Z^Z)+Rj) + tr(i?,(/ - Pztz)RJ) ^ ■ 

Lemma 2. tr(VFVar(0|A)< A;r^rank(Z) + df^^^rSiDk^Z) +tr(XE^X^). 

Lemma 3. There exist finite constants Ki and K 2 , not depending on A, such that 
||E[i?iu|A]|| <y/^Ki for i = l,...,r, and ||y - WE[0|A]|| <K 2 . 

Remark 2. The constants Ki and K 2 are defined in the Appendix. They do not have 
a closed form. 

We will write A ^ i? to mean that B — A is nonnegative definite. Let 'i/^max denote the 
largest eigenvalue of Z'^Z. 

Lemma 4. For each i £ {1,2,... ,r}, we have (V'maxAe + AuJ“^/g. ^ RiQf^Rf. 
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Proof of Proposition 1. Define the drift function as follows 


v{X) — aAg + aAg ^ A°. + ^ A„ 

i=l i=l 

where a and c are positive constants (that are explicitly constructed in the proof), and 
s is from the fourth condition in Proposition 1. We will show that there exist p G [0,1) 
and a finite constant L such that 

E[u(A)|A]= / v{\)kiiX\\)dX<pv{\) + L. (7) 

Then because the A-chain is a Feller chain [1] and the function v{-) is unbounded off 
compact sets (R&H12), by Meyn and Tweedie’s [8], Lemma 15.2.8, the geometric drift 
condition (7) implies that the A-chain is geometrically ergodic. We now establish (7). 
First, note that 


E[u(A)|A] = 


[ v{X)TT{X\0,y)dX 


IRp+1 U 

= aE[E[A=|0]|A]+aE[E[A-«|0]|A] 


7r(0|A,2/)d0 = E[E[u(A)|0]|A] 


+ ^E[E[ASJ0]|A]+^E[E[A-10]|A]. 

i^l i^l 

Using (2) and the fact that 0 < 5o + SSE/2 < 6o + 111/ — lU0|P/2, we have 


E[A^|0] 


r(ao + iv/2 + c) / 
r(ao + iV/2) ) 


^ r(ao+iV/2 + c) / , SSEA 

- r(a, + JV/2) r + —) 


( 8 ) 


As we shall see, since this upper bound does not depend on 0, it can be absorbed into the 
constant term, L, and we will no longer have to deal with this piece of the drift function. 
Now, 


E[Ari0] = 


< 


r(ao -f N/2 - s) 

r(ao + NI2) 

r(ao -f N/2 - s) 
r(ao + N/2) 


\y-W0\/^ 




\\y-W9\\ 


21 « 


( 9 ) 
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where the inequality follows from the fact that (a;i + 3 : 2 )^ < for xi,X 2 >0 and 

f G (0,1]. Similarly, using (3), for each i G {1,..., r} we have 


E[A-10] 


r(ai + qi/ 2 -s) / Ihill^ y 

r(a, + g,/2) 'v 2 J 


r(a, + gi/ 2 -s) / 

- r(a, + g,/2) V* 



Now, for each iG {1,..., r}, we have 


E[ASJ0] 


^{ai + q^/2 + c) f llu^lp A 
r(a,+g,/2) V 2 J 


^ r(ai + qi/2 + c) 
r(ai + qi/2) 



I{o}{h) + b-^lR^{h) 


( 10 ) 


( 11 ) 


Note that when 6^ > 0 there is a simple upper bound for this term that does not depend 
on 9. Therefore, we will first consider the case in which min{6i,..., br} > 0, and we will 
return to the other (more complicated) case later. 

Assume for the time being that min{6i,..., br} > 0. Then combining (8), (9), (10) and 
(11), and applying Jensen’s inequality twice yields 


E[u(A)|A] < 


a r(ao + N12 — s) 

^ r(ao + Ar/2) 


n\\y-wef\\]‘ 


rl-Y. 

2=1 


r(ai + qi/2 - s) 
r(ai + qi/2) 


.E[||u,||2|A]* + ifo, 


( 12 ) 


where 


ATo = a 


r(ao + N /2 + c) 
r(ao + N/2) 



° r(ao + «/2) l*°l 


r(ai + gi/2 + c) ,3 T(ai + qi/2 — s) 

r(a,+g,/ 2 ) * + r(a, + g,/ 2 ) * 


It follows from (5) that 

E[l|y-lT6»f |A] =tr(ITVar(6»|A)IT^) + ||y-ITE[6l|A]f. 


Similarly, since Ui = RiU, we also have 

E[||u,f|A]=E[||i?,uf |A]=tr(i?,Q-ii?f) + ||E[i?,u|A]f. 

Now, using Lemmas 2 and 3, we have 

E[l|y-lT0f |A]* < [Xf^mnkiZ) + dl,^mnkiZ)+triXJ:pX^)+Kir 
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< X-^{r^nk{Z)r + [dLxrank(Z) +tr(XS^X^) + if2T- 

Similarly, using Lemmas 1 and 3, we have 

E[h.f |A]* < (dL, + A,-i)tr(i?,(Z^Z)+i?f) 

r 

+ tr{Ri{I - PzTz)Rf)'^Xu^ +qiKf 

i=i 

r 

<X-%tr{R,{Z^Z)+Rf)Y + '£K^{tT{R,{I-PzTz)Rf)r 

i=i 

+ [dL,tr(i?,(Z^Z)+i?f) + g.iLiY- 

Define a function 6{-) as follows: 


(13) 


(14) 


r(ao + lV/2-s) /rank(Z)y 1 ^ r(a, + g,/2 - s) /tr(i?,;(Z^Z)+i?f) 


r(ao + lV/2) V 2 ) r(a, + g,/2) 


^T^\ s 


Combining (12), (13) and (14) yields 
E[u(A)|A] < a(5(a)A-* 


+ 


r(ai + ( 7 i /2 ~ s) / tr(i?i(/ — Pzt z)R'[) ^ 

^ r(a. + g./2) Iv 2 


(15) 


K- + L, 

J i=i 


where 


L = Kn 


a T{ao + N/2- s) 2 , , r^ 2 is 


2« r(ao + N/2) 


-[<,,rank(Z) + tr(XE^X' ) + KiY 


+ 2”' E [^-ax tr(i?.(Z^Z)+i?f) + 

r tUz + qi/ 


Next, defining 


/ 1 ;;,^ s, r(ai + 91/2 — s)/^tr(i?i(/— P 2 T 2 )i?f) ^ ^ 

,(«):= max|j(„), g r(„. + ,./2) (-2- 


we have from (15) that 


E[u(A)|A] < ap(a)A,-^ + p(a) Y^u^+L 

1=1 
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< p{a) ( aAg + aAg * + ^ ^ Kj | + ^ 

\ 1=1 1=1 / 

= p{a)v{X) + L. 


Hence, all that is left is to demonstrate the existence of an a G (0,oo) such that p{a) G 
[0,1). By (6), we know that 


r(ai + qi/2 — s) / tr(Ri(I — Pz'rz)Rf) \^ 

^ r(a. + g./2) Iv 2 J 


Therefore, it suffices to show that there exists an a G (0,oo) such that S{a) < 1. But 
(5(a) < 1 as long as 


a > 




r(ai+qi/2 — s) 
T(ai+qi/2) 


{ti{R,{I - PzTz)Rl)/2r 


1 - 


r(QQ-i-iv/2— s) 
r(ao+iV/2) 


(rank(Z)/2)® 


(16) 


which is a well-defined positive number by (6). The result has now been proven for the 
case in which min{&i,..., br} > 0. 


Remark 3. Note that the two terms in the drift function involving c were both absorbed 
into the constant in the first step of the iterated expectation. It follows that, at least in 
the case where min{6i,..., br} > 0, any c > 0 can be used in the drift function. 


We now proceed to the case in which there is at least one bi=0. Let H = {* G {1,..., r} : 
bi =0}. It follows from the development above that the following holds for any c > 0: 


rE[A:|A]+aE[V|A]+^E[A^JA]+^E[A-;|A] <p(a) aAg-^+^A-« + L. (17) 


i=l \ 1=1 / 

Of course, if a satisfies (16), then p{a) G [0,1). Now suppose we can find c> 0, a satisfying 
(16), and p'{a) G [0,1) such that 


^E[A^JA]<p'(a) 

ieB 


(ot\l + ^ AO.'j. 


Then combining (17) and (18), we would have 


( 18 ) 


E[u(A)|A] < p[a) ( aA, 


\ +L + p{a) f aAg + ^ A(j 
\ 1=1 / ies 

< max{p(a), p'(a)}u(A) -I- T, 
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which establishes the drift condition. Therefore, to prove the result when min{6 i,..., br} = 
0, it suffices to establish (18). li i G B, then 


E[A=|0] 


T{ai + qi/2 + c) 
r(ai + qi/2) \ 2 ) 


It follows from (5) that the conditional distribution of given A is 

multivariate normal with identity covariance matrix. Thus, uf {RiQ'^^R'[)~^Ui has a 
non-central chi-squared distribution with qi degrees of freedom. An application of Lemma 
4 from R&H12 shows that, if cG (0,1/2), then 


E[iuJ{R,Q^^Rj)-\,r^\X\<2-‘^ 


r{q^/2-c} 

r(ft/2) 


Putting this together with Lemma 4, we have that, ii i G B and c G (0,1/2), then 


E[(||Ui|P) ‘'|A] = (^/maxAe + AuJ^E[(uf (V^maxAe + AuJ/,,Ui) °|A] 
< ('0maxAe A„J^E[(lif {R^Q^^Rf)~^U^)~‘'\X] 
^g-c r(W2-c) 

- r{qj2) + 


-Fta75r<'‘" 


icAe + A^J. 


Define S'{-) as follows: 

cl/ _ V'max \ " E(ai -I- qi/2 + c) T{qi/2 — c) 

^ a r(a.+W2) r(g,/2) 


Now we have 


EeKJA]<E 

ies ies 


r{a, + qi/2 + c) T{qi/2 - c) ^ 
r{a, + qj2) T{q,/2) ^ 


+ A/.) 


= aS'ia)X'/ 


r(ai + qi/2 + c) T(qi/2 — c) 

ts r(a. + g./2) r(g,/2) “ 


Next, defining 


p'{a) = max'^ 5'{a)^vas^ 


r(ai -I- qi/2 G- c) T{qi/2 - c) 

T{ai + qi/2) T{qi/2) 


Ee[aSJA] 

i&B 



we have 
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Hence, all we have left to do is to prove that there exist cG (0,1/2) and a satisfying 
(16) such that p'{ol) G [0,1). First, define a = — maxi^B Oi, and note that this quantity is 
positive. R&H12 show that, if c G (0,1/2) n (0,d), then 

fr(ai + gi/2 + c)r(qi/2-c)] 
r(a,+q,/2) r(q,/2) /< ' 


Fix c G (0,1/2) n (0, a). Now it suffices to show that there exists an a satisfying (16) such 
that S'(a) < 1. But (5'(a) < 1 as long as 

, c r(ai + qi/2 + c) r(qi/2 — c) 

“ r(a. + sg./2) F(g,/2) ' 

So, (18) is satisfied for c G (0,1/2) fl (0,d) and 


a > max 


I n=i - Pzrz)mi^r 




r(a, 

max / ^ 
ieB 


l- "[-(C+ff2;^ (^ank(Z)/2)- 

: + 9 i /2 + c) T{qi/2 — c) 
r{ai + qi/2) T{qi/2) 


□ 


4. Discussion 

Our Corollary 1 is a direct generalization of Roman and Hobert’s [13] Proposition 1 
where we have removed all restrictions on the matrix X . We now present a related result 
from [1] that is established using a different drift function. 

Proposition 2. Under a proper prior, the block Gibbs Markov chain, {(A„, is 
geometrically ergodic ?/min{ao, ai,..., a^} > 1. 

Like Corollary 1, this result holds for any X. Neither result is uniformly better than 
the other. That is, there are situations where the conditions of Corollary 1 hold, but those 
of Proposition 2 do not, and vice versa. However, the condition min{ao,ai,... , 0 ^} > 1 
appears to be more restrictive than the conditions of Corollary 1 in nearly all practical 
settings. In fact, the only examples we could find where Proposition 2 is better than 
Corollary 1 involve models that have more random effects than observations. On the 
other hand, we do feel that Proposition 2 is worth mentioning because its simple form 
may render it useful to practitioners. For example, in an exploratory phase where a 
number of different models are being considered for a given set of data, one could avoid 
having to recheck the conditions of Corollary 1 each time the model is changed simply 
by taking ag = ai = ■ ■ ■ = Or = a > 1 for all models under consideration. 
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Appendix A: Preliminary results 


Let k = rank(X) = rank(X) < min{iV,p}, and consider a singular value decomposition 
of X given by UDV"'^, where U and V are orthogonal matrices of dimension N and p, 
respectively, and 


D:= 


kA* ^k,p—k 
0Ar-fc,fc 0N-k,p-k 


where := diagjdi,..., dfe}. The values di,.. .,dk are the singular values of X, which 
are strictly positive. Again, dmax denotes the largest singular value. The following result 
is an extension of Lemmas 4 and 5 in R&H15. 


Lemma 5. The matrix M\ can be represented as UH\U"^ where H\ is an N x N diag¬ 
onal matrix, H\ = diag{/ii,..., /iat}, where 


hi — 


1 

Agd? +1 

1 , 


ie 

i £ {k + 1 ,..., Xy. 


Furthermore, + 1) ^ ^ I- 


Proof. Using the definitions of ^ and X, we have 


Mx=I- XeXT^^X'^ = 1- XeXiXeX'^X + I) ^ x'^. 


Now using X = UDV^ leads to 

Mx = U{I - XeD{XeD^D + 


The matrix XeD{XeD^D 1) is an N x N diagonal matrix whose jth diagonal 

element is given by 


Xed] 
Aed| + 1 




Hence, I — XeD{XeD^D + I) = Hx, and Mx = UHxU^■ To prove the second part, 

note that, for j = 1,..., A^, 0 < (Aed^ax + 1)”^ — — 1- Thus, 


(AedLx + 1)"'^ = t/(AedLx + ^ UHxU^ F UU^ =1. □ 


Next, we develop an extension of Lemma 2 in R&H15. Define Z = U'^Z, y = and 
p = pip. Also, let Zi denote the ith column of Z"^, and let yi and r]i represent 

the ith components of the vectors y and ij, respectively. Let ti,t 2 , ■ ■ ■ ,tfi+q be a set of 
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(/-vectors defined as follows. For j = let tj = Zj, and let tAr+i,... ,tN+q be the 

standard basis vectors in For i = 1,.. .,N, define 


C* = 


agB 



jG{1.2,....Ar}\{z} 


Ojtjtj 


N+q 

■ E 

j=N+l 


-2 1 1/2 


Ojtjtj 


+ Oil 


The C*s are finite by [7], Lemma 3. 


Lemma 6. For all X € 


N 

WXeQf^Z^M^yW^'^mC* <cx, 

i=i 


\\XeQf^z^xT-^x:^^P0\\ <J2dj\p,\c; < oo. 

1=1 

Proof. Even though R&H15 assume X to be full column rank, their argument still works 
to establish the first inequality, so we omit this argument. We now establish the second 
inequality. First, 

U'^XT-^ = U'^X{XeX'^X + = D{XeD'^D + /)” 

Define R\ = D{XeD^D -|- 1)~^. This is an iV x p diagonal matrix, with diagonal elements 
ri, r 2 ,..., Cinin/ N,p} ■ These take the form 


Now 


dj 

^ed^ +1 




= \\XeiXeZ^A'hZ + A)-^Z^RxV^i:-^/^y^\\ 
= \\iZ^HxZ + Xf^A)-^Z^Rxr]\\ 

k 

= Y.{Z^HxZ + X:^A)-^z,r,p, 

k 

< Y,UZ^H^Z + A-iA)-'5,r,77,|| 

2=1 
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N 


-1 


=E 

k 

= E 

2=1 

k 

2=1 


E%5j/ij+A/A S,nrj, 

\j=i / 


~ ~T 
•^ 2-^2 ” 1 ” 






-1 






j^i 


where, in the last step, we have used the fact that hidi = ri for i = 1,..., 
1 ,2,..., fc, dehne 


Q(A) = 


~ ~T , 
ZiZi + 


E TT 


i/i 


-Zi 


Dehne A, =J2l=i ■ Fix *, and note that 


C'f (A) = zf ZizJ + ZjzJ-ff: + h, ^Xe ^A ) £ 




-2 


= € + E - A.-U) + 


-2 


Dehne as follows: 


h.’ 

1 


-1 


hj Ae A» 
Au,^ A^ 
hj^Xe 

Xu2 •^» 

hiXfi 


Xu. - X-^ 

V /ij Ae 


j = l,...,i-l,i + l,...,N, 
j=h 

j = N + 1, ... ,N + qi, 
j = N + qi + l,...,N + qi+q 2 , 

j = N + qi + ■ ■ ■ + q.-i + 1, ■. ■, N + q. 


Then 


Ct{X) = tilui + 


jG{1.2,...,N}\{i} 


N+q \ -2 

i=Af+l / 


-Vi 

i 

-1 

k. For i = 


'2- 


(A.l) 
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Clearly, Wj >0 for a\\ j = 1,... ,N + q. It follows that 


/ N+q \ -2 

Cf{X)< sup tf ititf + ^ QjtjtJ + ^ ajtjtj + ail\ = (C*)^. 

aeR++® \ je{l,2,...,N}\{i} j=N+i J 


Hence, 


\\X,Qf^Z^XT-^j:/pp\\ <J 2 d^h\c:. 


□ 


Appendix B: Proof of Lemma 1 

Lemma 1. For each i G {1,2,..., r}, we have 

r 

ti{R,Qf^Rf) < (dLx + Ar') tY{R,{Z^Z)+Rf) + tr(i?,(/ - Pzrz)Rj) Y. Y' 
Proof. From Lemma 5 we have 

Q\ = XeZ^M\Z + A ^ , ,2-+ A ^ . ,2-Amin^, 

^e®max + ^ '^eWmax + ^ 

where Amin = min{Au^,..., A„^}. Letting O'^O^ be the spectral decomposition of 
we have 

Q\ ( .. n TT'^ -^ + Amin7^ ~ ^ (fn ■,- 1 + Amin^^ O . (B.l) 

VAe<ax + l / V<ax + Ae' J 

Next, let '!'+ be a 9 x g diagonal matrix whose ith diagonal element is 

Now note that, for z = 1,..., g, we have 

(^ .-1 + Amin^ < (c^max + Ag ^)i’t + Amin^fO} ii’t)' 

\ “max + -^e / 

Hence, 

(12 -, \-l ^ Amin7^ ^ ('^max + Afi + Ajjj;„(/- P 4 ,), (B.2) 

V<ax + Ae J 

where Pqt is a g x g diagonal matrix whose zth diagonal entry is 1 — I^oyippi). Combining 
(B.l) and (B.2) yields 


Qa ' ^ «ax + A-i)OvI; + ot + x-lo{I - P^) 0 ^ 
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= «ax + K^){Z^Z)+ + x-loii - P^)0^. 

Let X = {i G {1,... ,g} : ijji > 0}, and let O be the sub-matrix of O consisting of the 
column vectors Oi where i GX. Then 

op^,o'^ = ^OioJ = dd^. 

iei 

Since {oi}igx forms an orthonormal basis for the column space of Z ^ it follows that 
is the orthogonal projection onto Z'^Z. Consequently, 

0{I - P^)0^ = OO^ - OP^O'^ = 1- OO^ = / - PzTz. 


Thus, 


and finally. 


Qa ' ^ «ax + K^)iZ^Z)+ + (/ - PzTz) ^ A, 


tiiR^Q^^Rj) < (dLx + A-') tr(i?,(Z^Z)+i?f) + tr(i?,(/ - Pztz)RI) ^ Xu-- 


□ 


Appendix C: Proof of Lemma 2 

Lemma 2. tr(lT Var(0|A)lT^) < A“^ rank(Z)-|-(i^,^xrank(Z')-|-tr(AfE^Af^). 

Proof. R&H15 show that 

tr(lT Var(6»|A)lT'^) = tr(ZQ-^Z^) -k tiiXP-^X^) - tr((/ - XIx)ZQ-^Z^{I -k Ma)), 
and that tr((/ — M\)ZQ^^Z'^{I + M\)) > 0. Hence, 

tr(lTVar(6»|A)lT'^) < tr(ZQ-iZ^)-ktr(Afr-A^), 

Next, note that ^ XgX^^X + = Tx. Hence, E^ ^ 

tv^XT^^X'^) < triXEpX'^). 

Now, from Lemma 5, we have 

, ,2-+ A ^ XeZ"^MxZ -|- A = Qx, 

^e^max + ^ 

and it follows that 
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Finally, using Lemma 3 from R&H15, we have 


tr 



+ 1 


Z^Z + A 



< 



-1 

rank(Z’) 


= Ag ^ rank(Z) + <,,^rank(Z). 


□ 


Appendix D: Proof of Lemma 3 

Lemma 3. There exist finite constants Ki and K^, not depending on A, such that 
|lE[i?iu|A]|| <y/^Ki for i = l,...,r, and ||y - kFE[0|A]|| <K 2 . 

Proof. From (4) and Lemma 6, we have 

|lE[u|A]|| = \\XeQf^Z^iAhy-XT-^J:/p0)\\ 

< iWXeQf^Z^MxvW + \\X,Qf^Z^XTff^E^^pp\\) 

( N k \ 

i=i i=i / 

Now, for each i S {1,..., g}, we have 

||E[i?,u|A]|| < ||i?,||iFi = ^JtT{RfR,)Kl = 

This proves the first part. Now, it follows from page 10 of R&FI15 that 

||y-1TE[0|A]|| < ||M;,||||y|| + ||Xr-iS^V/3|| + 

Now, using Lemma 5, and the fact that hi < 1, for ? = we have 

N 

\\XhV = tr(Mf Ma) =Y,hj<N. 

1=1 

Recall from the proof of Lemma 6 that U'^XT and note that 

k 

pAf =tr(R^R,) = ^r2<MLx■ 

i=l 


Therefore, 


\\XTff^i:-p^pp\\ = wuu^XTff^xi-p^pipW 

= ||C/RaE^E-'/V/3|| 
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<||C/||||i?A||||F%'/V/3|l 


Putting all of this together, we have 

\\y-WE[e\X]\\ < \\M^\\\\y\\ + + ||M^||||Z||||E[w|A]|| 

< ^/N\\y\\ + ^/N^/kd^,^\\V^J:-^^^y^\\ + ^/N\\Z\\K,. □ 

Appendix E: Proof of Lemma 4 

Lemma 4. For each i € {1,2,..., r}, we have (V'maxAe + ^ RiQ^^Rf. 

Proof. Lemma 5 implies that Z'^M\Z ^ Z"^Z. It follows that 

Q\ = XeZ^M\Z + A ^ XgZ^Z + A ^ Xe1pina,xl + A. 

Thus, 

(AeV’max +A„J“^/ = i?i(Ae^max-f+ A)"^i?f < RiQx^ RJ ■ 
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